For our independent variables, we use per capita crime rate by town, average number of rooms per dwelling, Charles River dummy variable (= 1 if tract bounds river; 0 otherwise), nitrogen oxides concentration (parts per 10 million) and a constant.
Below we wrote a function which computes: OLS point estimates for the intercept, slope parameters, and the error variance. Suitable test statistics with corresponding p-values for the relevant coefficients. Intervals of the coefficients for a confidence level of 95%.
OLS <- function(X,Y){
# OLS estimates for coefficients
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% Y
Y_hat <- X %*% beta_hat # Fitted values
e <- Y - Y_hat # residuals
n <- nrow(X) # Number of observations
k <- ncol(X) - 1 # Number of covariates excluding intercept
s <- as.numeric(t(e)%*%e / (n-k)) # SSR adjusted for degrees of freedom
sigma <- s*solve(t(X) %*% X) # VCV of Beta hat
se <- sqrt(diag(sigma)) # standard error
t_stat <- (beta_hat-0) / se # Compute t-statistic
p <- pt(abs(t_stat), n-k, lower.tail=FALSE) # Compute p-value
# 95% Confidence interval
th <- qt(0.975, n-k)
conf <- cbind(beta_hat-th*se,beta_hat+th*se)
colnames(beta_hat) <- "estimate"
colnames(conf) <- c("2.5%","97.5%")
colnames(t_stat) <- "t-statistic"
colnames(p) <- "p-value"
error_variance <- s
list(rbind(beta_hat,error_variance), cbind(t_stat, p), conf)
}
The output of the function is presented below:
OLS(X,Y)
## [[1]]
## estimate
## Constant -17.259625
## Crime -0.184610
## Rooms 7.706840
## Charles_River 4.673807
## NO_pp10m -14.960358
## error_variance 35.771378
##
## [[2]]
## t-statistic p-value
## Constant -5.373025 5.935625e-08
## Crime -5.358215 6.414580e-08
## Rooms 19.155659 4.233091e-62
## Charles_River 4.388052 6.974274e-06
## NO_pp10m -5.674181 1.178869e-08
##
## [[3]]
## 2.5% 97.5%
## Constant -23.5707812 -10.9484684
## Crime -0.2523012 -0.1169189
## Rooms 6.9163879 8.4972927
## Charles_River 2.5811627 6.7664511
## NO_pp10m -20.1404236 -9.7802928
We could consider 6 different countries and an indicator of a high trade between them. For illustration, we could consider a rule where there is directed edge from A to B, if B is one of the top 5 export destinations of A. We don’t actually check for biggest trading partners empirically, but lets assume that such procedure gives rise to following network (arrows are mostly made up to achieve 12 edges and nice interpretations):
The corresponding adjacency matrix is:
## US MX DE AT CN IN
## US 0 1 0 0 1 0
## MX 1 0 0 0 1 0
## DE 1 0 0 0 0 0
## AT 1 0 1 0 1 0
## CN 1 0 1 0 0 0
## IN 1 0 0 0 1 0
A very basic concept of centrality could define an agent as most central, if it has the highest number of directed edges pointing towards itself (i.e. if it is important export country for most other countries). Using this criterion we can see from the graph that or from columns of adjacency matrix that US the most central agents in this network, because it is top 5 trading partner for all 5 other countries. The least central agents would be Austria and India, since they are not a top 5 export country for any country from this network.
Another basic criterion of centrality could be the amount of outwards pointing arrows of an agent. Thus a country would be considered central if it exports to highest number of countries from this network. In this sense, Germany is least central with only 1 outward arrow, while Austria is most central with 3 outward arrows.
Eigenvector centrality: There is no sink, but not possible to get to Austria. ? Page Rank: weights (probs) need to be defined?
Germany which had only 1 connection, is the only row with sum of 1. Thus its the only edge which survives row normalization. Thus here depending on criterion US or Germany would be most central agent. Other countries would be all least central.
Based on Inflowing arrows, China becomes the most central agent and India, Austria and Mexico the least central agents. Based on Outflowing arrows, Austria is still the most central agent, while Germany is the least central agent.
We simulate the data based on the row normalized adjacency matrix for trade partner connections, that we created in the preceding exercise.
Let’s say we are interested in estimating the impact of a countries fentanyl precursor production (x) on the log of a countries drug deaths (y). Then Wx denotes the average of neighboring countries fentanyl precursor production. We hypothesize that if a countries close trade partners fentanyl precursor production rate is high (as measured by Wx), then the fentanyl related drug deaths are high. Moreover, there are spillover effects with regards to the trading partners fentanyls related deaths from connected countries (Wy).
We specify the following linear-in-means model to estimate the
relationship.
\[y = Wy \lambda + Wx \delta + x\beta +
\varepsilon\]
Where \(Wy\) denotes the average death rate from drugs of countries that are close neighbors.
To simulate the values for y, we need to solve for the reduced form. \[y = (I - W\lambda)^{-1}(Wx \delta + x\beta + \varepsilon)\]
set.seed(123)
# number of agents
N = 6
# parameters definition
sigma2 = 1
lambda = 0.6
delta = 0.3
beta = 2 # true beta is 2
W = adj.rn
simulations <- 10000
times <- c(1:simulations)
result <- numeric(simulations)
for (i in times) {
# simulation of vecotrs
x = rnorm(N, 0, 1)
e = rnorm(N, 0, sigma2)
# caluclating means
Wx <- W %*% x
# calculating S
S = diag(N) - lambda * W
# generating y's
y = solve(S, Wx * delta + x * beta + e)
model <- lm(y ~ x)
result[i] <- coef(model)["x"]
}
inconsistent_estimate <- mean(result)
my_data <- data.frame(
Name = c("Simulated coef", "Real coef"),
Estimate = c(inconsistent_estimate, beta)
)
knitr::kable(my_data, format = "markdown")
| Name | Estimate |
|---|---|
| Simulated coef | 1.718638 |
| Real coef | 2.000000 |
We set the true value for \(\beta\) to 2, \(\lambda\) to 0.6, and \(\delta\) to 0.3. We then regress y on x, repeat this a 1000 times, and find that the mean of the estimand is 1.718638. Thus using the simple linear regression fails to recover the true value of \(\beta\).
Download a suitable shapefile for NUTS2 regions (from here or using R directly) and some dataset of interest at the same level of aggregation (e.g. from here). Install and load the sf and ggplot2 packages together with their dependencies.
## Reading layer `NUTS_RG_03M_2021_4326_LEVL_2' from data source
## `/Users/gustavpirich/Desktop/GITHUB/spatial_econ/assignment_1/data'
## using driver `ESRI Shapefile'
## Simple feature collection with 334 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -63.15176 ymin: -21.38696 xmax: 55.83509 ymax: 80.83426
## Geodetic CRS: WGS 84
## [1] "WGS 84"
## [1] "EPSG:27700"
The projection used is the “World Geodetic System 1984” or for short “WGS 84” (EPSG 4326). We then map the data to the “North American Datum 1983” (NAD83) projection, which is extremely similar to the WGS 84.
The discrete scaling shows the agricultural value added in million Euros at a NUTS-2 Level in 2021. While the first map visualizes the value added in continous scaling. The second map portrays agricultral value added in a binned scaling. Note that animal production and fishing is also included, which accounts for the high agrilcutlral value added in Western France.
There are two types of spatial data, raster and vector data. Raster graphics can be stored as a png and jpeg. Vector graphics can be stored as a svg (Scalable Vector Graphics) or eps (Encapsulated PostScript) file. Vector graphics use mathematical equations like shapes, colours, and lines. An advanatage of vector graphics is that they do not lose resolution when you zoom in on them.
## Warning in brewer.pal(n = 10, name = "Blues"): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors
## Warning in brewer.pal(n = 10, name = "Reds"): n too large, allowed maximum for palette Reds is 9
## Returning the palette you asked for with that many colors
#Preliminaries
install.packages("spDataLarge", repos = "https://geocompr.r-universe.dev")
##
## The downloaded binary packages are in
## /var/folders/qn/v8g2k85538166gyk25t5c2kw0000gn/T//Rtmp7s7HwB/downloaded_packages
library(tmap)
library(spDataLarge)
data("pol_pres15")
summary(pol_pres15$I_Duda_share)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06433 0.30362 0.38993 0.40091 0.48993 0.78566
summary(pol_pres15$I_Komorowski_share)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03747 0.20285 0.29594 0.29540 0.38709 0.66901
par(mfrow = c(1, 2))
hist(pol_pres15$I_Duda_share, main = "Support for Komorowski", xlab = "Support")
hist(pol_pres15$I_Komorowski_share, main = "Support for Duda", xlab = "Support")
par(mfrow = c(1, 1))
#Support for Komorowski and Duda ##Support in the first run
The first visualization shows the support of the two most promising
candidates for the Polish Presidency, Duda and Komorowski. One can
observe that the support for DUda came from the southern and eastern
parts of Poland while the north and west of the countries supported
Komorowski:
##Support in the second run
In the second run, we wanted to show again the outcome of the results, but now for the second round. We can see when observing the map that in some municipalities the majority for one candidate changed to the opposing candidate and vice versa. However, what seems to withstand the first and second run is that in the southern and eastern part of Poland, Duda was elected in majority and in the northern and western part of the country, Komorowski won got the majority of the votes:
##Possible Issues with voting envelopes
The next map is showing possible issues with voting envelopes. Specifically, we wanted to investigate if all envelopes sent were also received. If the number of envelopes received is equal to the number of the envelopes sent, the municipality is visualized green. If the municipality is visualized in grey, that means that there were no voting envelopes used for the election. If a municipality is red however, that indicates that the total number of envelopes sent is higher than the number of envelopes received (at a threshold off 99 percent). That indicates that there were probably issues with the deliviery of the voting envelopes. We can observe red clusters especially in urban regions (Warsaw, Katowice and Krakow). An intresting side fact however is that one can clearly see that in urban areas, voting envelopes were used more often than in rural areas of Poland: